Deriving meaningful rules from gene expression data for classification
نویسندگان
چکیده
We propose a novel scheme for designing fuzzy rule based classifiers for gene expression data analysis. A neural network based method is used for selecting a set of informative genes. Considering only these selected set of genes, we cluster the expression data with a fuzzy clustering algorithm. Each cluster is then converted into a fuzzy if-then rule, which models an area in the input space. These rules are tuned using a gradient descent technique to improve the classification performance. The rule base is tested on a leukemia data set containing two classes and it is found to produce excellent results. The membership functions associated with the rules are then analyzed and the rule base is further simplified without compromising the classification accuracy. The most attractive attributes of the proposed scheme are: it is an automatic extraction scheme; unlike other classifiers, it produces human interpretable rules, and it is not expected to give bad generalization as fuzzy rules do not respond to areas not represented by the training data.
منابع مشابه
Gene Expression Profiling of DNA Microarray Data using Peano Count Trees (P-Trees)
The explosion of genomic data made possible by advances in parallel, high-throughput technologies in the area of molecular biology, has ushered in a new era in the area of Bioinformatics. During the last many years, efforts concentrated on sequencing the genome of organisms. Current emphasis lies in extracting meaningful information from this huge DNA sequence and expression data. The technique...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملAn Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Intelligent and Fuzzy Systems
دوره 19 شماره
صفحات -
تاریخ انتشار 2008